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Abstract 

The plenoptic function describes the visual information available to an observer at any point in space 
and time. Samples of the plenoptic function (POF) are seen in video and in general visual content (images, 
mosaics, panoramic scenes, etc), and represent large amounts of information. In this paper we propose 
a stochastic model to study the compression limits of a simplified version of the plenoptic function. In 
the proposed framework, we isolate the two fundamental sources of information in the POF: the one 
representing the camera motion and the other representing the information complexity of the "reality" 
being acquired and transmitted. The sources of information are combined, generating a stochastic process 
that we study in detail. 

We first propose a model for ensembles of realities that do not change over time. The proposed 
model is simple in that it enables us to derive precise coding bounds in the information-theoretic sense 
that are sharp in a number of cases of practical interest. For this simple case of static realities and 
camera motion, our results indicate that coding practice is in accordance with optimal coding from an 
information-theoretic standpoint. 

The model is further extended to account for visual realities that change over time. We derive bounds 
on the lossless and lossy information rates for this dynamic reality model, stating conditions under which 
the bounds are tight. Examples with synthetic sources suggest that within our proposed model, common 
hybrid coding using motion/displacement estimation with DPCM performs considerably suboptimally 
relative to the true rate-distortion bound. 
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I. Introduction 

A. Background 

Consider a moving camera that takes sample snapshots of an environment over time. The samples are 
to be coded for transmission or storage. Because the movements of the camera are small relative to the 
scene, there are large correlations among multiple acquisitions. 

Examples of such scenarios include video compression and the compression of light-fields. More 
generally, the compression problem in these examples can be seen as representing and compressing 
samples of the plenoptic function f2]. The 7-D plenoptic function (POF) describes the light intensity 
passing through every viewpoint, in every direction, for all times, and for every wavelength. Thus, the 
samples of the plenoptic function can be used to reconstruct a view of reality at the decoder. The POF 
is usually denoted by POF(x, y, z, <fi, tp, t, A), where (x,y,z) represents a point in 3-D space, (<fr,cp) 
characterizes the direction of the light rays, t denotes time, and A denotes the wavelength of the light 
rays. The POF is usually parametrized in order to reduce its number of dimensions. This is common in 
image based rendering Q, 0. Examples of POF parameterizations include digital video, the lightfield 
and lumigraph 151. 1151, concentric mosaics Q, and the surface plenoptic function [8]. Regardless of the 
parametrization, due to the large size of the data set, compression is essential. 

In this work, we consider the plenoptic function in terms of a spatial position and a time dimension. 
Thus, our initial setup is that of POF(x, y, z, t). We also assume that we do not have information on the 
constituents dimensions, but rather we are given a sampled plenoptic function that needs to be compressed. 
A typical scenario involves a camera traversing the domain of the POF and acquiring its samples to be 
compressed and then stored for later rendering. The information to be compressed is thus POF(W(t),t) 
where the trajectory W(t) collectively represents a sequence of positions and angles where light rays are 
acquired. In such a context, it is crucial to know the compression limits and how the parameters involved 
influence such limits. This then provides a benchmark to assess the performance of compression schemes 
for such data sets. 

B. Prior art 

The practical aspects of compressing video and other examples of the plenoptic function have been 
studied extensively (see e.g., O, (H, ifTOll , 1111 . and references therein). But very little has been done 
in terms of rate-distortion analysis and addressing the general question of how many bits are needed 
to code such a source. Due to the complexity inherent to visual data, the source is difficult to model 
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statistically. As a result, precise information rates are difficult to obtain. Several statistical models have 
been proposed to analyze video sources lfT2l . lfj"3l . lfl4l . lfl"5l . Often, one obtains the rate-distortion 
behavior resulting from a particular coding method, such as the hybrid coder used in video. For instance, 
the work in fi6l analyzes the rate-distortion performance of hybrid coders using a Gauss-Markov model 
for the video sequence as well as for the prediction error that is transmitted as side information. A similar 
rate-distortion analysis for light-field compression is done in ifTTl . Such models are interesting but they 
work with the assumption of predictive coding from the start, and thus they do not reveal the intrinsic 
information rate of the visual source. 

The compression of the POF is also studied in lfT8ll . but in a distributed setting. Using piece-wise 
smooth models, the authors derive operational rate-distortion bounds based on a parametric sampling 
model. Another scenario of POF coding is studied in |fj"9l . 
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Fig. 1. The problem under consideration. There is a world and a camera that produces a "view of reality" that needs to be 
coded with finite or infinite memory. 



C. Paper contributions 

The general problem can be posed as shown in Figure Q] There is a physical world or "reality" (e.g., 
scenes, objects, moving objects), and a camera that generates a "view of reality" V. This "view of reality" 
(e.g., a video sequence) is coded with a source coder with memory M giving an average rate of R bits 
per sample. This bitstream is decoded with a decoder with memory M to reconstruct a view of reality 
V close to the original one in the MSE sense. We refer to memory and rate in a loose sense. Precise 
definitions of memory and rate are given in Section III-BI 

In this paper we propose a simplified stochastic model for the plenoptic function that bears the essential 
elements of the general case. We take the viewpoint that video can be seen as a 3-D slice of the POF. Our 
approach is to come up with a statistical model for video data generation, and within that model establish 
information rate bounds. We first propose a model in which the background scene is drawn randomly at a 



August 7, 2009 



DRAFT 



IEEE TRANSACTIONS ON INFORMATION THEORY 



5 



prior time, but otherwise does not change as time progresses. Within this "static reality" model we develop 
information rates for the lossless and lossy cases. Furthermore, we compute the conditional information 
rate that provides a coding limit when memory resources are constrained. We then extend the model to 
account for background scene changes. We then propose a "dynamic reality" that is based on a Markov 
random field. We compute bounds on the information rates. For the Gaussian case, we compute lower 
and upper bounds that are tight in the high SNR regime. Examples validating our theoretical findings are 
presented. 

The models proposed and studied in this paper make several assumptions to make the problem 
mathematically tractable. Our goal here is to make assumptions that simplify the problem but still keep 
the main elements of the general problem of compressing data from a moving camera. While the resulting 
models are not a perfect depiction of reality we believe they have merit as they provide a framework to 
investigate such processes. What is more, our assumptions allow us to derive coding bounds that to the 
best of our knowledge are unknown, even in the case of our very simplified models. 

The paper is organized as follows. Section [TT] sets up the problem and introduces notation. The video 
coding problem is treated in Sections [III] and [IVJ We present results for the static reality case in Section 
ITTTl and treat the dynamic case in Section JV] Concluding remarks are made in Section [V] 

II. Definitions and Problem Setup 

A. Simplified model 

We describe a simplified model for the process displayed in Figure Q] Consider a camera moving 
according to a Bernoulli random walk. The random walk is defined as follows: 

Definition 1: The Bernoulli random walk is the process W = (Wt : t G Z + ) such that Pr {Wo = 0} = 
1 and for t > 1, 

t 

i=i 

where {iVj} are drawn i.i.d. from the set {—1, 1} with probability distribution Pr{Ni = 1} = pyy- 

We assume without loss of generality that p\y < 0.5. Moreover, throughout the paper, the index t is 
considered a discrete-valued variable. 

In front of the camera there is an infinite wall that represents a scene that is projected onto a screen 
in front of the camera path (i.e., we ignore occlusion). The wall is modelled as a 1-D strip "painted" 
with an i.i.d. process X = (X n : n G Z) that is independent of the random walk W. The process X 
follows some probability distribution px drawing values from an alphabet X. Here we focus on the rather 
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unrealistic i.i.d. case due to its simplicity. Generalization to stationary process is left for future work. In 
the static case, the wall process X is drawn at t = 0. Figure |2] (a) illustrates the proposed model. 
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Image(t) 



Vi 
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(b) 



Fig. 2. A stochastic model for video, (a) Simplified model, (b) The resulting vector process V. Each sample of the vector 
process is a block of L samples from the process X taken at the position indicated by the random walk Wt - In the figure L — 4. 



At each random walk time step, the camera sees a block of L samples from the infinite wall, where 
L > 1. This results in a vector process V = (Vt : t € Z + ) indexed by the random walk positions, as 
defined below. 

Definition 2: Let W be a random walk independent of X, and let L be an integer greater than one. 
The vector process V = (Vt : t G Z + ) is defined as 

V t := (Xw t , Xw t +i, ■ ■ ■ ,Xw t +L-l)- (1) 

The random walk is a simple stochastic model for an ensemble of camera movements. It includes 
camera panning as a special case, i.e., when p\y = 0. The discrete displacements of the random walk 
thus neglect other effects such as zooming, rotation, and change of angle. 

Notice that consecutive samples of the vector process, which are vectors of length L, have at least 
L — 1 samples that are repeated. Furthermore, because the process X is i.i.d., it follows that the vector 
process V is stationary and mean-ergodic. Figure [2] (b) illustrates the vector process V. 

B. The video coding problem 

Given the vector process V = (Vq, V\, ■ ■ ■ ), the coding problem consists in finding an encoder/decoder 
pair that is able to describe and reproduce the process V at the decoder using no more than R bits 
per vector sample. The decoder reproduces the vector process V = (Vq, Vi, ■ ■ ■ ) with some delay. The 
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reproduction can be lossless or lossy with fidelity D. The encoder encodes each sample V t based on the ob- 
servation of M previous vector samples V%-\, . . . , Vt-M- Thus, M is the memory of the encoder/decoder. 
Since encoding is done jointly, there is a delay incurred. The lossless and lossy information rates of the 
process V provide the minimum rate needed to either perfectly reproduce the process V at the decoder, 
or to reproduce it within distortion D, respectively. Note that the information rate (lossless or lossy) is 
usually only achievable at the expense of infinite memory and delay [20]. 

C. Properties of the random walk 

The following notions are needed in this paper. 

Definition 3: Let TV be a random walk. The set of recurrent paths of length t is the set 

72* := {(Wq, Wi, . . . , W t ) : W t = W s for some s, 0<s<t}. 

If a path belongs to 72.*, we call it a recurrent path. We call Pr {72*} the probability of recurrence at step 
t. 

The probability of the complementary set Pr {"72*1 is called the first-passage probability. When a site 
Wt has not occurred before, we refer to it as a new site. A related quantity is the probability of return. 
Definition 4: Let W be a random walk, and let t > s > 0. Consider the set 

T s * ■= {(W , Wi,... ,W t ) :W t = W s but W t / W h for any i such that s < i < t}. 

We call PrjT,*} the probability of return at step t after step s. 

When s = 0, we write T* for 7q. From Definitions [3] and [4] one can check that 

t 

1=1 

where the union is a disjoint one. Furthermore, the sets are shift-invariant in the sense that 

Pr{7?} =Pr{T*- s }. (3) 
Combining Q and ([3]), we also have that 

t t 

pr {*>*} = £>{22-i} = E Pr ( Tl } • w 

i=0 i=0 

In addition to the above, for the case of the Bernoulli random walk we have the following 1211 . 11221 . 
Lemma 1: For the Bernoulli random walk with p\y < 1/2, the following holds: 

(i) limt-oo Pr {72*} = l-2 Pw . 

(ii) For t > 0, PrjT 2 *- 1 } = 0, and Pr{T 2 *} = 2C t ^ ((l-pw)Pw)*, where C t := ^( 2 /). 
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III. Information Rates for a Static Reality 

A. Lossless information rates for the discrete memoryless wall 

Denote V 1 = (Vi, . . . , Vt). We assume that Vq is known to the decoder. Unless otherwise specified, 
we assume that X takes values on a finite alphabet X. We seek to quantify the entropy rate of V [23]: 

H(V) = lim -H(V*) = lim ^(^l^* -1 ). (5) 

To characterize H(V), we describe intuitively an upper and a lower bound (resp. sufficient and necessary 
rates) that will be formalized in Theorem [J below. For a sufficient rate, note that V can be reproduced 
up to time t when both the trajectory W l = (W±, . . . , Wj) and the samples of the wall occurring at 
the new sites of W l are available. When t is large, this amounts to H{W l ) = tH(pw) bits for the 
trajectory, plus tPr |^*| H(X) » t(l — 2pw)H(X) for the new sites. So, a sufficient average rate is 
H{pw) + (1 — %Pw)H(X). Moreover, the complexity of V is at least the total complexity of all visited 
new sites, and so (1 — 2p\y)H(X) is a necessary rate. This intuitive lower bound can be improved by 
examining the probability of correctly inferring the random path W l from observing the vector process 
V*. This probability is related to the following event: 

A L := { (X , . . . ,X L ) = (x ,x 1 ,x ,x 1 , . . .), x ,xi £ X}. (6) 

The probability of the event Al is closely related to the probability of ambiguity from the observation, 
making the trajectory unidentifiable. To see this, let L = 4 and consider inferring W\ from the observation 
of (Vo, Vi). If Vq = {xq,x\,xq,x{) and V\ = {x\,Xq,X\,Xq), then it follows that W\ cannot be unam- 
biguously determined from (Vo, V\). Intuitively, if W t can be determined from V*, then the complexity 
of the trajectory is embedded in V* and thus independently adds to the information complexity of X. 
If, however, there is ambiguity on W f , then sets of W l that are consistent with a particular trajectory v* 
can be indexed and coded with a lower rate. We are now ready to state and prove Theorem 1. 

Theorem 1: Consider the vector process V consisting of L-tuples generated by a Bernoulli random 
walk with transition probability pw < 1/2, and a wall process X drawing values i.i.d. on a finite alphabet, 
and that has entropy H{X). The conditional entropy //(Vt|V i_1 ) obeys 

Pr\W}H(X) + H(p w )-H(P e )<H(V t \V t - 1 ) < - J>r H{X) + H(p w ), (7) 

i=i 

where P e is the probability of error in estimating W\ from observing (Vi, Vo)- The entropy rate H(V) 
satisfies 

(1 - 2p w )H{X) + H{p w ) - H(P e ) < H(V) < (1 - 2p w )H{X) + H(p w ). (8) 
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Proof: For each t we have 

t . 6 

^=l 



(g + H{w t \v t ) 
t 



H{W t )+Hty t \W t ) 
t 



(10) 



t 

t 



H(p w ) + ^J2H(y i \V i -\W i ), (11) 



i=l 

where (a) follows because i/(Vt|V* _1 ) decreases with t, (b) holds because H{W t \V t ) > 0, and (c) is 
true because H(W l ) = tH(pw) and (Wj+i, . . . , Wf) is independent of (V" 1 , W % ). Further, it is true that 

H{V i \V i -\W i = w\w l is recurrent) = 0. 
H(V i \V i ' 1 ,W i = w i ,w i is not recurrent) = H(X). 

Consequently, 

HiVilV*- 1 ^) = Pr {^ = w i }H(V i \V i -\W i = w i ) 

= Pr^W^H(X). (12) 

Combining © and (fl2l) gives the upper bound in (0). We now turn to the lower bound. Using the 
chain rule for mutual information and the information inequality, we have 

HiVtlV 1 " 1 ) = H(y t \V t ~ 1 ,W t )+I(W t ;V t \V t ~ 1 ) 

= HtVtlV*- 1 , W l ) + J(W* _1 ; VtlV^ 1 ) + I(W t ; V^V 1 ' 1 , W 1 ' 1 ) 

> H(V t \V t ~ 1 ,W t )+I(Wt\Vt\V t ~ 1 ,W t ~ 1 ). (13) 

Moreover, because the random walk increment W% — Wt-i is independent of W l ~ l ), it follows 

that 

I(W t ; V^V 1 ' 1 ,W l ~ l ) = HiWtlV 1 - 1 ^*- 1 ) - H(W t \V\W 1 - 1 ) 

= H( Pw )-H{W t \V\W t - 1 ). (14) 
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We proceed by finding an upper bound for H{Wt\V l , W f 1 ). Because conditioning reduces entropy, and 
using Markovianity, we have that 

HiWtlV^W*- 1 ) < H(W t \Vt,Vt-i,W t -i) 

= HiW^VuVo). (15) 

Denote by P e the probability of error of estimating W\ from observing (Vi, Vq). Then, Fano's inequality 
gives that 

H{W x \V lt V ) < H(P e ) + 1ag 2 {\). (16) 

Combining this with (fT3l - IT4T) and (fl2l) . we assert the lower bound in Q. By letting t —> oo in (JT]) and 
using Lemma [T] (i) we obtain ([8]). ■ 

Remark 1: The upper bound of Theorem Q] contains slack. One trivial example is when the entropy 
of the process X is 0. In such a case the upper bound reduces to H(py/), which is clearly loose given 
that the vector process V has zero entropy in this case. The size of the conditional entropy H{Wt\V t ) 
determines the amount of slack in the bounds (see (©). Such entropy depends, among other things, on 
the size of the alphabet of the process V and on the block length L, as the next example illustrates. 

Remark 2: In the case where L is odd, then an expression for the slack in terms of the probability 
of the set Al in ([6]) can be obtained. Denote by A\ the set of (vi,vo) is such that W\ cannot 
be inferred with with probability one. Then, because L is odd, it is straightforward to infer that Q 
Pr{Wi = l|(v ,vi) <E A\} = pw and Pr{VFi = l|(v ,vi) G Ai} = 0. Consequently, we have that 

HiWilVo^) = Pr{^ 1 }F(Pr{Vyi = l|(v ,v 1 )G^l 1 }) (17) 
= Pi{Ax}H( Pw ). 

The set Al is contained in the set {Vj_i = (xq, xi, . . .), V% = (x\, Xq, . . .)}. Therefore, we have that 
Pr-jyU} < Pr{A L } and so #(Wi|Vb, Vi) < Pr{A L }H{p w ). Combining this with (H2-0 and ©, 
we obtain the following bound: 

(1 - 2 Pw )H(X) + H (pw)Pr {A~ L } < H{V) < (1 - 2p w )H{X) + H(p w ). (18) 

This special case of the results in Theorem 1 is useful because the slack can be computed analytically 
as in Example 1 below. 

'Note that when L is even, then we cannot assert that PrlVVi = l|(vo,Vi) 6 Ai} = pw- 
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Remark 3: Note that for any L, we always have that P e < Pr{Ai,} so that in many cases, as L — > oo, 
then Pr{Ai} — ► 0, and the bounds in Theorem 1 become tight. Theorem Q] shows that, under some 
conditions, optimal encoding in the information-theoretic sense can be attained by extracting and optimally 
coding the trajectory W t , and optimally coding the spatial innovations in the vector samples V*. 

Remark 4: For the symmetric random walk case, there is an intuitive explanation for the H (p\y) upper 
bound. At time t, with high probability we have that —cy/i < W s < c\/t for < s < t. Therefore, with 
high probability, the number of new sites that are visited up to time t, which is | maxo< s <t \ W S \, is less 
than A= which converges to zero as t — > oo. Thus, the term corresponding to the entropy rate of the 
source X vanishes in the information rate of the V process. 

Example 1: Suppose that the X is uniformly distributed over \X\ values. Then, it is easily seen that 

Consequently, the difference between upper and lower bounds in (O decays exponentially fast when the 
block length L —> oo. For a fixed L, the difference also decays as \X\ increases. Thus, for L and \X\ 
sufficient large, we have that Pr{Ai} « 0, and we can approximate the entropy rate as 

H(V)^(l-2 Pw )log\X\+H( Pw ) 

bits per block. Note that if p\y = 1/2, then the recurrence property of the random walk generates 
redundancy that has the effect of reducing the entropy of the vector process. Figure [3] illustrates the 
bounds when X is Bern(l/2), and L = 9. We see that in this case, the derived upper and lower bounds 
are very tight. 

B. Memory constrained coding 

From source-coding theory, the entropy rate H(V) can be attained with an encoder-decoder pair 
with unbounded memory and delay. In the finite memory case, often the encoder has to code Vt based 
on the observation of Vt-i, • • • , V*-m> an d the decoder proceeds accordingly. This situation is similar 
to one encountered in video compression, where a frame at time t is coded based on M previously 
coded frames |9|. In this case, the average code-length is bounded below by the conditional entropy 
H(Yt\Vt-i, . . . , Vi-m) = H(Vm\Vm-i, ■ ■ ■ , Vq). The bound (|7]) in Theorem Q] describes the behavior of 
the conditional entropy Intuitively, by looking at the stored samples from t — M up to 

t, the encoder can separately code Wt and take advantage of recurrences present from t — M to t — 1. 
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0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 

Pw 

Fig. 3. Bounds on information rate. Lower and upper bounds as a function of pw for the binary wall with px = 1/2 and 
L = 9. 

In effect, finite memory prevents the encoder to exploit long term recurrences that are not visible in the 
memory. Similar observations are verified in practice for instance in 1131 . fl2ll . lfl4l . 

Figure [4] illustrates how memory influences coding when X is uniform over an alphabet of size 
\X\ = 256. The curves are computed using the upper bound in |7]). Because the alphabet size is large, 
the bound is tight. In the most recurrent case with pw = 0.5, the conditional entropy approaches the 
entropy rate at a slower rate when M — > oo [see QEJ]- Furthermore, as M approaches infinity, there is 
a significant reduction in the conditional entropy. For instance, an encoder that uses 1 frame in the past 
with optimal coding would need about twice as many bits as one that uses 4 frames. By contrast, when 
pw = 0.1, because longer term recurrences are rare, moderate values of M are already enough to attain 
the limiting rate. As a result there is little to gain by increasing M. 

The observations drawn from Figure |4] are also verified in practice for instance in Ifl2l . 11241 . |fT3l . 
Finally, we point out that the issue of exploiting long term recurrences dates back to Ziv-Lempel lf25l in 
lossless compression. Extension of the Lempel-Ziv algorithm to the lossy case is discussed in [26], and 
lossless compression of two-dimensional array in |j27l . More recently, an universal scheme to optimally 
scan and predict data in a multidimensional field with applications to video is presented in ll28Tl . 
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Fig. 4. Memory constrained coding. Difference H(V) — H(Vm \V m ) as a function of M. When pw = 0.5, the bit rate can 
be lowered significantly at the cost of a large memory. A moderate bit rate reduction is obtained with small values of M when 
pw = 0.1. The curves are computed using Theorem Q] for X uniform over an alphabet of size 256. 

C. Lossy information rates 

In this section we assume again that the process X is i.i.d. and that X n takes values over a finite 
alphabet X. Information rates for the lossy case take the form of a rate-distortion function. Consider a 
t-tuple (Vi, . . . ,Vt) where each Vj is a random vector taking values in X L . A reproduction t-tuple is 
denoted by (V\, . . . ,Vt), and its entries take values on a reproduction alphabet X. A distortion measure 
is defined as follows: 



where d s : X x X — > E + is a distortion measure for an L-dimensional vector. For example, for the MSE 
metric we have d(Vi,Vi) = \\Vi — Vi\\ 2 . 

The rate-distortion function for each t, and for given distortion measure, is written as 




i=i 



inf 

miv* y*)<D 




(19) 



t 
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where the infimum of the normalized mutual information — '- is taken over all conditional pdf 's 
Pr such that Ed(y*,T>*) < D. 

The rate-distortion function for the process V = (Vx, V%, ■ ■ •) is given by |[20l 

R V (D) = lim R V t(D). (20) 

t — >oo 

Because the process V is stationary, it can be shown that the above limit always exists (see (20l p. 270], 
or HI). 

By coding the side information W l separately, an upper bound for Ry(D) similar to Theorem 1 can 
be developed. The upper bound is based on the notion of conditional rate-distortion 11301 . EDI . This 
notion is developed in the lemma below. 

Lemma 2: (Gray ll30l ) Let V be a random vector taking values in X and let W be another random 
variable. Define the conditional rate-distortion: 

R v \w(D)= mf I(V;V\W), (21) 

Ed(V,V)<D 

where the infimum is taken over all conditional distributions of V given V and W. The conditional 
rate-distortion obeys 

R V \wiP) < R v{D) < R v \w(D) + I(V; W). (22) 
The conditional rate-distortion of V t conditioned on W l is defined as follows: 

KV*- V t \W t ') 

R v u wt {D)= inf V ' ' (23) 

where the infimum is taken over all probability distributions of V 1 conditional on V t and W l . The 
conditional rate-distortion can be bounded in terms of the rate-distortion function of the process X. 
Proposition 1: The conditional rate-distortion function satisfies 

Umeap R vt \ w *(D) < (1 - 2 Pw )R x {D). (24) 

t^oo 

Proof: Let X(w t ) denote the number of new sites from the path w l . Then, conditional on w l , the 
V 1 has only X(w t ) entries that need to be encoded. For each (w 1 , v l ), let fw*^ 1 ) denote the vector with 
the X(w t ) entries of v t to be coded. Moreover, let V and V be such that Ed s (T^[j], Vi[j)) < D for 
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i = 0, . . . , t, and j = 0, . . . , L — 1. We have 

J(y*;F*|T^*) = ^Pr{VK* = w t }l{V t ;V t \W t = u>*) 

> £> r {W = ^/(/^(v*); iw 1 = «>') 

> ^Pr{iy* = u;*} X(w t )R x {D) 

= EXiW^RxiD), (25) 

where we have used the inequality I(X; Y) > I(f(X); g(Y)) for measurable functions f,g 11311 . and the 
fact that the process X is i.i.d. and independent of W l , and that the individual distortions are less than 
D. The lower bound can be achieved as follows. Let p*(X\X) be the test channel that attains Rx(D). 
We let X n be the result of passing X n though the channel p*(X n \X n ). For each given w 1 we construct 
V 1 from X and wt- This results in a joint conditional distribution that attains the lower bound (f25T >. 
Because the lower bound is attainable, it follows that 

Rv*\w(D) < -2JLLr x (d). 

Moreover, using Lemma[T]it is straightforward to check that t _1 EA(l / l / ') converges to (1 — 2pw), which 
concludes the proof. ■ 

The above proposition enables us to derive an upper bound for the rate-distortion function. 

Theorem 2: Consider the i.i.d. process X such that X ri takes values over a finite alphabet X. Let 
Rx(D) denote its rate-distortion function. The rate-distortion function of the process V satisfies 

R v (D)<H(p w ) + (l-2 Pw )Rx(D). (26) 

Proof: Using Lemma |2] we have the following bound based on the conditional rate-distortion function 

El: 

Ry*\w*(P) < R V *{D) < R vt \ Wt (D) + jI(V t -W t ) 

< Ry t \ Wt {D) + H{p W ). 

Letting t — > oo and using Proposition Q] asserts (l26b - ■ 
Remark 5: To describe V 1 to the decoder with average expected distortion less than D we do as 
follows. Covey the trajectory W l to the decoder spending on average « tH(p\y) bits. Then describe the 
"spatial innovations" with an average expected distortion less than D spending « \ t Rx(D) bits where 
X t ~ t(l — 2py/) is the number of new sites visited up to time t. On average, by using this scheme, one 
needs H(p\y) + (1 — 2pw)Rx(D) bits which is the upper bound presented in 
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Remark 6: Because the alphabet is finite, if the reproduction alphabet X is a superset of the original 
alphabet X and, in addition, the distortion measure is such that d(x, x) = if and only if x = x, then we 
have that for each t, Ryt(D) converges to t^ 1 H(V t ) as D — > |[20ll . Consequently, for large alphabet 
sizes and large block length, the entropy rate bound of (l26l ) is sharp, and so the above bound on the 
rate-distortion is also sharp for small distortion values. 

Theorem |2] shows that in the low distortion regime, optimal encoding in the information-theoretic sense 
can be attained by extracting and coding W t losslessly, and using the remaining bits to optimally code 
the vector samples corresponding to spatial innovations. This statement has implications, for example for 
high rate video coding, since it indicates that motion should be encoded exactly, with the remaining bits 
allocated to prediction errors. 

IV. Information Rates for Dynamic Reality 

A. Model 

The model in the previous section assumes a "static background." More precisely, the infinite wall 
process X is drawn at time and does not change after that. In practice, however, scene background 
changes with time and a suitable model would have to account for those changes. New information comes 
fundamentally in two forms: the first consists of information that is "seen" by the camera for the first 
time, while the second consists of changes to old information (e.g., changes in the background). In this 
section, we propose a model that accounts for both these sources of new information. 

To develop a model for scenes that change over time, we model X as a 2-D random field indexed by 
(n, t) G Z x Z + . A simple yet rich model for the field is that of a first order Markov model over time, 
and i.i.d. in space. The random field is defined as follows: 

Definition 5: The random field is the field RF = {X$ : (n, t) G Z X Z + }, such that (Xn^ '■ n G Z) 
is i.i.d. and for each n G Z, the process (X$ : t G Z) is a first order time-homogeneous Markov process 
possessing a stationary distribution. 

The fact that the random field (Xn^ : n G Z) is i.i.d. simplifies calculations considerably. One 
justification for this model is when the field is Gaussian. In such case, independence is attained by a 
simple linear transformation of the process (Xn^ : n G Z). It can be shown that such transformation 
preserves the Markovianity on the time dimension, and the i.i.d. assumption can be justified in this case. 

Throughout this section, we assume that the Markov chain of the vector process is already in steady- 
state. This assumption is common, for example, in calculating rate-distortion functions for Gaussian 
processes with memory |[20l . 
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The dynamic vector process V is defined similarly to the static case, but now taking snapshots or 
vectors from the random field: 

Definition 6: Let RF = {Xn : (n, t) G Z X Z + } be a random field, and let W be a random walk. 
The dynamic vector process is the process V = (Vj : t G 1> + ) such that for each t > 0, 

vt - \*wv A w t +i> • • • ' A iy t +L-iJ- 
The random field and the corresponding vector process are illustrated in Figure [5] 
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Fig. 5. A model for the dynamic reality, (a) First, there is a random field that is Markov in the time dimension t, and i.i.d. in 
the spatial dimension n. (b) Motion then occurs within this random field. 



We point out here that the proposed random field model is a simplified depiction of real visual scene 
data. For instance, we acknowledge that the spatial independence assumption in non-Gaussian cases is 
not met in practice, and that the camera motion is not i.i.d. in practice. We stress however that true rate- 
distortion bounds are difficult to derive for more elaborated sources, and that even a simplified model 
with true coding bounds is useful provided its deficiencies are acknowledged. 

B. Lossless information rates 

In the development that follows we assume, for simplicity, that the random field takes values on a 
finite alphabet X. The results can equally be developed for a random field taking values over R, under 
suitable technical conditions. 

To derive bounds for H(V) in the dynamic reality case, we compute the following conditional entropy 
rate: 

H(V\W) := lim A^IV* -1 , W*), (27) 
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if the limit exists. As we shall see in the examples that follow, the above limit can be computed 
analytically. The key is to compute Hty^V 1 ' 1 , W t = w l ) by splitting the set of all paths into recurrent 
and nonrecurrent paths, and further splitting the set of recurrent paths according to (0. 

Referring to Figure 13b), let w f be a given path and consider the process V t . Note that each Vt has 
L — l entries from the same spatial location as L — 1 entries from Vt-\. The remaining entry corresponds 
to either a nonrecurrent or a recurrent location depending on it;*. If w l is nonrecurrent, then by the 
Markov property of the field, we have 

HiVtlV'- 1 ^ = w l ) = H[X®) + [L — l)H(xfp |Xq* -1 ^). 

If a path is recurrent at t, then there is an s < t such that w s = wt but wt 7^ w%, for s < i < t. Using the 
Markov property again, it follows that H{y t \V tr - 1 , W l = w*) = H(Xq^\Xq^) + (L — l)H(xf > \X^ 1 ' ) ). 
The above argument is explicitly written as follows: 

fl^lV* -1 ,^) = Yl HiVtlV*- 1 ^ = w t )Pt{W t = w 1 } 

+ H ( v t\ vt ~^ wt = ujt ) Fr { wt = ujt } ( 2§ ) 

= (h(x®) + i)H(xU\xt 1] )) Pr {n 1 } 

L*/2J 

+ Y Y HiVtlV*- 1 ,^ = w')Pr{W* = w 1 } (29) 

i=i tu f er t '_ 2 , 

= {H{X®) + {L- l)H(xV\xt 1} )) Pr{^} 

L*/2J 

+ Y [ (H(X®\xt 2l) ) + (L- l)H(X^\xt 1] )) Pr{^a} (30) 

= (L— l)H(X®\xt 1] ) + H(X^)Pr\lt] 

L*/2J 

+ ^iJ(xf)|x(°))Pr{^}. (31) 

i=l 

By letting t —> 00 using Lemma Q] (i) leads to 

00 

H(V\W) = H(xt ] )(l ~ 2pw) + (L- l)H(X^\X^) + Y J H(X^\X$ ,) )?r{r} , (32) 

i=i 

where Pr{T*} is the probability of return given in Lemma Q] (ii). The infinite sum in the left-hand 
side of (l32l is well-defined. It is an infinite sum of positive numbers, and it is bounded above by 
H ( x o°° ) )T,Zi Fr { Ti } = H(X^)2p w . Note that we replaced 2i with % in the infinite sum above in 
view of the fact that Pr {T 2i+1 } = 0. 
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With the conditional entropy rate in (1321 ) we can derive lower and upper bounds on the entropy rate 
H(V). To derive an upper bound, we bound t ' for each t and let t — ► oo. For the lower bound, 
similar to Section JIIJ we bound Hty^V 1 ^ 1 ) below. Because the alphabet X is finite and the process is 
stationary, the limits of t ' and H(Vt\V ) as t — > oo coincide. 

The upper bound is obtained from the inequality H(V l ) < tH(p w )+H(V t \W t ). Note that = 
Y,\=\H{V j \V i - 1 ,W t ), so that if H^V 1 " 1 ,W l ) converges to a limit as t — > oo, we have necessarily 
that t~ 1 H(V t \W t ) converges to the same limit (see e.g., (23l p. 64]). So, 

,. H ( Vt ) rrr \ , ^(^W) 

lim — i — '- < H( Pw ) + lim 



t—>co t t^oo t 

= H(p w ) + lim H(Vt|V* _1 ,W*) 

t— >oo 



H(V|VK) 

To derive a lower bound, note that the development leading to (113 H 14[) for the static case also holds 
for the dynamic case. So, we have 

H^IV*- 1 ) > H{p w ) + HiVtlV 1 " 1 , W l ) - H(W t \V\ W 1 " 1 ). (33) 

Thus, a lower bound is obtained by finding an upper bound for HiW^W 1 ^ 1 , V 1 ). Because the process 
X changes at each time step, we cannot use the event Al to obtain an upper bound for H{Wt\W t ~ 1 , V*) 
as in the static case. A useful upper bound for {{{W^W 1 ^ 1 , V*) is obtained by using Fano's inequality. 
Let P e denote the probability of error in estimating Wt based on observing Y t := (Vt, Vt-i, Wt-i), i.e., 

P e = ?r{w(Y t )^W t }, 

where W(-) is a given estimator assumed to be the same for all t. Since Wt-i is observed, estimating 
Wt amounts to estimating the increment N t = Wt — Wt-i. Because V is stationary and N t is i.i.d., it 
follows that P e does not depend on t. From Fano's inequality, we have that 

HiWtlV^W 1 - 1 ) < H(N t \Y t ) 

< H(P e ) + P e log 2 (l) 

= H(P e ). (34) 



J 



Consequently, a lower bound is obtained by combining (1331 with (1341) aboveo By letting t — > oo we 
arrive at the following: 

2 Sharper lower bounds can be obtained by estimating N t using (V 1 , W 1 ^ 1 ). However, the estimate using Y t is easily computed 
and already leads to a sharp enough bound. 
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Theorem 3: Consider the vector process V consisting of L-tuples generated by a Bernoulli random 
walk with transition probability pw with pw < 1/2, and the random field RF = {X$ : (n, t) G 
Z x Z + ,\X\ < oo} that is i.i.d. in the n dimension and first-order Markov in the t dimension. The 
entropy rate of the process V obeys 

H(pw) + H(V\W) - H(P e ) < H(V) < H{p w ) + H(V\W), (35) 

where H(V\W) is as in d32l ). and P e is the probability of error in estimating W\ based on the observation 
of Yi = (Vi, Vo,Wq) with any estimator W{Y{). 

The lower and upper bounds become sharp when P e — * 0. This occurs with large block sizes and for 
small changes in the background. The examples that follow illustrate the sharpness of the above bounds. 
In the first example, we consider a binary process X, and on the second a Gaussian process with AR(1) 
temporal innovations. 
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Fig. 6. The binary random field. Innovations are in the form of bit flips caused by binary symmetric channels between 
consecutive time instants. 

Example 2: BSC innovations 
Suppose that at t = 0, the process is a strip of bits that are i.i.d. Bernoulli with initial distribution px- 
Suppose that from t to t + 1 there is a nonzero probability pi that the bit Xfp is flipped. This amounts 
to a binary symmetric channel (BSC) between Xfp an d Xn , as illustrated in Figure [6l The t BSCs 
in series between Xn and X n are equivalent to a single BSC with transition probability (see |23l p. 
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221], problem 8) 

p It t = 0.5 (1 - (1 - 2p/)*) . (36) 

Note that for pj > 0, we have that lim t ^oo Pl.t = 0.5. So, for each n, the distribution of X$ converges 
to the stationary distribution Bern(0.5). Substituting in (l32l gives for pj > 0: 

. oo 

= (~)(1 - 2p w ) + (L — 1)H( PI ) + H(p I)2l )Vx {T 21 } . (37) 

Notice that when pj = we recover the static case. By using the above in (|35l ) we obtain the corresponding 
bounds. Figure [7] (a) illustrates the lower and upper bounds for L = 8 and px = 0.5. We compute the 
bounds using (l32l and (1351 ). where we truncate the infinite sum in (l32l at a very large t. The probability 
P e is computed through Monte Carlo simulation using a simple Hamming distance detector. The bounds 
are surprisingly robust in this case, and provide good approximation of the true entropy rate. Notice 
that when pj increases, the entropy rate of the recurrent case (pw = 0.5) crosses that of the panning 
case (pw = 0.05). This is because in the recurrent case a greater amount of bits is spent coding the 
innovations. 

Figure [7] (b) shows the contour plots of the upper bound for various pairs (pi,pw)- The plot shows how 
the two innovations are combined to generate a given entropy value. Notice that when p\y approaches 
|, the entropy of the trajectory becomes significant and it compensates for the lesser amount of spatial 
innovation. 

To measure the effect of memory in the dynamic case, we evaluate the upper bound on the conditional 
entropy rate (as in (fTTT)). and the upper bound on the true entropy rate given by ((3]). Figure [8] illustrates 
the difference between the conditional entropy upper bound and the true entropy upper bound. The curves 
are similar to the ones obtained in the static case with spatial innovation (Figure H}, and confirm the very 
intuitive fact that memory is less useful when the scene around changes rapidly. 

Example 3: AR(1) Innovations. 

Although the development leading to Theorem [3] was made for finite alphabets, the same calculation 
can be done for a random field taking values on E, provided it has absolutely continuous joint densities. 
In this case, the entropies involved become differential entropies. For example, for each n 6 Z and 
< p < 1, let 

4*> = P X t l) + et 

for t G Z + , where ej ~ N(0, 1 — p 2 ) i.i.d. and independent of X. Such a random field model is used 
for instance in ||32l for bit allocation over multiple frames. Let </>(cx 2 ) denote the differential entropy of 
a Gaussian density with variance a 1 : 
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(b) 

Fig. 7. The binary symmetric innovations, (a) The curves show the lower and upper bounds on the entropy rate. Notice that 
the bounds are sharp for various values of pi. (b) Contour plots of the upper bound for various pi and pw- The lines indicate 
points of similar entropy but with different amounts of spatial and temporal innovation. 



^a 2 ) := Uog 2 (2nea 2 ). 

It is then easy to check that hiX^) = 0(1), and h{X^\xf ] ) = </>(l - p 2i ), so that we obtain a 
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Fig. 8. Memory and innovations. Shown is the difference between the conditional entropy and the true entropy for the binary 
innovations with px = 0.5, pw = 0.5, and L — 8. The curves show the intuitive fact that when the background changes too 
rapidly, there is little to be gained in bitrate by utilizing more memory. 



lower and an upper bound on the differential entropy rate using Theorem 3. The conditional differential 
entropy rate /i(V|W) is 

oo 

h(V\W) = - 2 PW ) + (L - 1)0(1 - p 2 ) + (1 " P U ) Pr {T 2t } . (38) 

i=l 

The infinite sum on the right-hand side is well defined. Because 1 — p 2k converges to 1 as k — > oo we 
see that for any value of p in (— 1, 1), the tail of the infinite sum is a sum of positive numbers. Using 
((U) and Lemma[T](i), we see that X]£iP r {^o 2 *} = %PW- Because (/>(•) is concave, we can use Jensen's 

inequality as follows: 

oo oo r T 2k\ 

^^(i-^)pr{r 2fc } = 2 W £^(i-P 4fe )^p^ 

k=l k=l 

Using Lemma [T](ii) and the generating function for the Catalan numbers (33, one can further check 
that 

oo 

£(1 - p 4fe )Pr {T 2fc } = ((1 - 4(1 - PW )PWP 4 )) 1/2 - (1 - 2 P w), 
k=i 
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Fig. 9. Differential entropy bounds for the Gaussian AR(1) case as function of the innovation parameter p. In this example 
P e is small enough that the lower and upper bounds practically coincide. Note that the slope of the differential entropy curve 
is influenced by the value of pw ■ 



so that the last term is controlled by 

£>(1 -// fe )Pr{T 2fc } < W ^-^-Pw)PwPr /2 -(l-2 P w) 

The above upper bound turns out to be a very good approximation of the infinite sum in (1381 ) when pw 
is close to 0, and when p is away from 1. 

Notice that for L large and p close to 1, P e and H(P e ) are small so that the bounds in Theorem 
3 are sharp. Figure [9] displays the bounds on the differential entropy as a function of p. The bounds 
are computed following Theorem [3] and (I38T ). Here P e is inferred via Monte Carlo simulation with 10 7 
trials, and a minimum MSE detector for Wt- The inferred P e is so low that the lower and upper bounds 
practically coincide. Analytical computation of P e is a detection problem beyond the scope of this papers. 

C. Lossy information rates for the AR( 1 ) random field 

Consider the AR(1) innovations of the previous example. Under the MSE distortion measure it is 
possible to derive an upper bound to the lossy information rate. The key is to compute R V t\ W t{D) 
defined as in (|23l and use the upper bound 11301 : 

R V *(D) < H( Pw ) + R vt]wt (D), (40) 
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for each t > 0. The conditional rate-distortion satisfies the Shannon lower bound (SLB) EOl : 

R V t\wt{D) > 1 - L<f>(D). (41) 

The key observation is that for a given fixed trajectory w l , the rate-distortion function of V 1 is that of 
a Gaussian vector consisting of the samples of the random field covered by W*. For a Gaussian vector, 
the SLB is tight when the per sample distortion is less than the minimum eigenvalue of the covariance 
matrix (see EOl p. 111]). The next proposition gives a condition under which (|4~T1) is tight, and thus when 
combined with (l40l) provides an upper bound on the rate-distortion function. 

Proposition 2: Consider the vector process V resulting from the Gaussian AR(1) random field with 
correlation coefficient < p < 1, and a Bernoulli random walk with probability pw < 1/2- The Shannon 
lower bound for the conditional rate-distortion function is tight whenever the distortion satisfies 

< D < — (42) 
1 + p 

To assert the claim we rely on the following lemmas: 

Lemma 3: Let Xi, X2, ■ ■ ■ , X m be a sequence of Gaussian vectors in W 1 such that Xj ~ -/V(0, Cj), 
and where each Cj has spectrum X(Cj). Let W be a random variable independent of X\, . . . , X m such 

that Pr {W = j} = [i 3 : for j = 1, . . . , m. Consider the mixture 

in 

X = J2hw=j} X r < 43 ) 
i=i 

Denote by R x \w(D) the conditional rate-distortion with per-sample MSE distortion D. Then, if 

m 

D<mm{J X(Cj), (44) 
i=i 

the conditional rate distortion function is 

in 

Rx\w(D)=Y J ^Rx J {D). (45) 
i=i 

Proof: Let p(X,X\W) be such that dr 1 W\X — X\\ 2 < D. Then, 

m 

I(X;X\W) = J2hHX;X\W = 3) (46) 

m 

> ^//.//v U)i) (47) 

i=i 

with 

m 

d _1 E||X - X\\ 2 = VjDj < D, (48) 
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and Dj := dr 1 E(||X — X|| 2 |VF = j). The above is minimized when 

R' Xj (D 3 )=6, (49) 
where is some constant. Suppose D < minlj™^ X(Cj) and Dj = D. We have 

^(A) = ^E^og 2 (^), (50) 
p=i 

where Aj iP are the eigenvalues of Cj and moreover R' x {Dj) = —-^j- = — so that conditions for a 
minimum are satisfied. The lower bound can be attained by setting 

p(X,X\W = j)=p*(X,X), 

where p* (Xj , Xj ) attains R Xj {Dj). ■ 
Lemma 4: ( P4l p. 189]) Let A be a n x n Hemitian matrix, and let 1 < m < n. Let A m denote a 
principal submatrix of A, obtained by deleting n — m rows and the corresponding columns of A. Then, 
for each integer k such that 1 < k < m, we have 

A fc (A) < A fc (A fc ), (51) 

where A&(A) denotes the £>th largest eigenvalue of matrix A. 



Proof of Proposition |2j The SLB for each t > is given by 

R V t\w*{D) > ^ - L(j){D). (52) 

Because 

in view of Lemma [3l it suffices to show that for each t > 0, and for < D < j-^, the bound 

is achievable. Given W l = w f , the above bound is attainable if D is smaller than the minimum eigenvalue 
of the covariance matrix of the random field samples covered by w t . Denote this covariance by C w t := 
Cov(y*|u; t ). Because the random field is independent in the spatial dimension n, the spectrum of the 
covariance matrix is the disjoint union of the spectra of the covariance matrices corresponding to the 
random field samples of V 1 at similar location n. Each C w t is a submatrix of the t x t Toeplitz matrix 
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T t (p) with entries [Tt(p)]ij = p' J J 'L Since A m j n (T 4 (p)) decreases to (1 — p)/(l + p) as i — > oo ll35l . by 
applying the Lemma above we conclude that 

^min(C w t) > \min(Tt(p)) > , ; - ■ (53) 

1 + P 

Therefore, the bound d52l is achievable for each t and since the limit of R V t\ W t{D) exists it follows that 
the bound is achievable for t — ► oo. ■ 
Example 4: We simulate the AR(1) dynamic reality model. To compress the process V t , we estimate 
the trajectory and send it as side information. With the trajectory at hand, we encode the samples 
with DPCM, encoding the residual with entropy constrained scalar quantization (ECSQ). We build two 
encoders. In the first one, prediction is done utilizing only the previously encoded vector sample; in the 
second, all encoded samples up to time t are available to the encoder (and decoder). Figure [TOl illustrates 
the SNR as a function of rate when the block-length L = 8. In Figure [10] (a) and (b) we have p = 0.99 
and the upper bound is valid for SNR greater than 23 dB. In Figure [TOl (a), we have pw = 0.5. Because 
the scene changes slowly and is highly recurrent, the infinite memory encoder (M = oo) is about 3.5 dB 
better than when M = 1. The same behavior is not observed when the scene is not recurrent (panning 
case, p\y = 0.1, Figure [10] (b) ), and when the background changes too rapidly (p = 0.9, Figure [TOl (c)). 

V. Conclusion 

We have proposed a stochastic model for the plenoptic function that enables the precise computation 
of information rates. For the static case, we provided lossless and lossy information rate bounds that are 
tight in a number of interesting cases. In some scenarios, the theoretical results support the ubiquitous 
hybrid coding paradigm of extracting motion and coding a motion compensated sequence. 

We extended the model to account for changes in the background scene, and computed bounds for 
the lossless and lossy information rates for the particular case of AR(1) innovations. The bounds for this 
"dynamic reality" are tight in some scenarios, namely when the background scene changes slowly with 
time (i.e., p close to 1). 

The model explains precisely how long-term motion prediction helps coding in both static and dynamic 
cases. In the dynamic model, this is related to the two parameters ipwiP) that symbolizes the rate of 
recurrence in motion and the rate of changes in the scene. As (pw,p) — * (0.5, 1), long term memory 
predictions result in significant improvements (in excess of 3.5 dB). By contrast, if either p is away from 
1, or if pyy is away from 0.5, long term memory brings little improvement. 
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R UB (D), p w = 0.5, p = 0.9 , L=6 

DPCM & ECSQ, M=~ 
DPCM & ECSQ, M=1 



(c) 



Fig. 10. Performance of DPCM with motion for various p and pw- For p — 0.99 and p — 0.9 the upper bound is valid for 
SNR greater than 23 dB and 12.8 dB, respectively, (a) Memory provide considerable gains, pw = 0.5, p = 0.99. (b) Modest 
gains when pw = 0.1. (c) Modest gains when p — 0.9, as background changes too rapidly. 



Although we developed the results for the Bernoulli random walk, the model can be generalized to other 
random walks on Z and Z 2 . Our current work includes such generalizations. It also includes estimating 
p and pw for real video signals and fitting the model to such signals. 
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